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CDNA Library Construction and Expressed Sequence Tags 
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Abstract: Gentiana officinalis, an alpine plant, one of the widely used Tibetan traditional medicines . In this study, total 
RNA was extracted from the whole plant of flowering individuals of this species, and a cDNA expression library was con- 
structed using Creator" SMART" cDNA Library Construction Kit . The results showed that the titer of the cDNA expression 
library was 1.2X 10 pfi] and the efficiency of recombination was 95.9% . The average length of insert fragments in the 
library was longer than 500 bp . A total of 181 valid ESTs were obtained from random sequencing of 343 clones . Further 
bioinformatic analyses suggested they represented 144 unique clonal sequences in which 55 sequences showed high homolo- 
gy to previously identified genes in Gentianaceae or other plants, 35 sequences matched to other uncharacterized expressed 
sequence tags (ESTs) , and 54 sequences showed no well matches to available sequences in DNA databases . No protein 
matched to the latter two sorts of ESTs (89) . Fifty-five ESTs with matched proteins were involved in a series of diverse 
functions: protein expression (35% ), photosynthesis (22% ) , metabolism (18 % ) , defense (11 96) , membrane transport 
(5%), cell division (5% ) , chromosome metabolism (2% ) and signaling components (2%) . At last, RT-PCR primers 
were designed according to the effective ESTs to amplified the cDNAs of G. officinalis, which further verified the accuracy 
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of the ESTs . This cDNA library provided a critical basis for further analyses of functional genes and gene expression in this 


alpine species . In addition, these ESTs could be used to design functional nuclear primers for studying population genetics 


of this species and closely related species . 


K ey words: Gentiana officinalis; Genomic Research; cDNA library; Expressed Sequence Tags; Gentiana 


Gentiana (Smith, 1936), a large genus in the 
Gentianaceae, is widely distributed in the high moun- 
tains in the temperate regions of the world (Ho and 
Liu, 2001) . Many species of this genus have been 
used as traditional Chinese medicines and@&r Tibetan 
medicines ( Pharmacopoeia commission of PRC, 
2000), for example,“ Long Dan" and" QingJiao” . 
These medicines were used to stimulate digestion and 
appetite and relieve heartburn and stomach ( Van der 
Sluis et al., 1983; Tang and Eisenbrand, 1992) . The 
main chemical constituents in these plants are comprise 
gentiopicroside and swertiamarin (Skrzypczak ef al., 
1993) . Despite the diverse researches on the resource, 
taxonomy and evolution of the genus ( Adams, 1995; 
Ho, 1985, 1988; Ho and Liu, 2001), nuclear genes of 
the genus received little attention . A cDNA library and 
expressed sequence tags (ESTs), through sequencing of 
randomly selected cDNA clones from cDNA library, pro- 
vide an important basis for further functional analyses of 
nuclear genes within one species (Ewing et al., 1999, 
2000; Nelson et al., 2000) . In addition, EST-derived 
nuclear markers can be developed from such ESTIS and 
used to evaluate genetic diversity, interspecific relation- 
ships of plants (Ma ef al., 2006; Zhou et al., 2007) . 

In this study, we aimed to construct cDNA library 
and analyze ESTIS characteristics of G. officinalis . This 
species occurs in the high altitude region and has been 
used as one of traditional Tibetan medicine . To our 
knowledge, this is the first time to construct a cDNA li- 


brary and sequence ESIS for one alpine gentiana species . 


1 Materials and Methods 
1.1 Plant materials 

The flowering individuals of G. officinalis were used to exact 
the total RNA . Fresh leaves, flowers and roots were collected to- 
gether and dehydrated in liquid nitrogen, then stored at - 80°C 
for further RNA extraction . 
1.2 RNA exraction and cDNA library construction 

Total RNA was extracted using Trizol Reagent Kit (Molecu- 


lar Research Center, Inc . USA) . The following procedure was 
performed according to the manufacturer s recommendation of the 
Creator™ SMART™ cDNA Library Construction Kit (Clontech, 
Mountain View, CA) . At first, using SMART technique, CDS 
IS’ primer was used to synthesize the first-strand cDNA . Long 
distance polymerase chain reaction (LD PCR) was used to syn- 
thesize the double-strand cDNA that was then digested by Sfil 
and fractionated by CHROMA SP IN-400 Column . The cDNAs 
are longer than 0.5 kb were collected and ligated to pBNR-LIB 
vector. The recombinant plasmids transformed into E.coli- 
DHX . The quality of the cDNA library was strictly checked by 
conventional titer determination . Twenty-one plaques were ran- 
domly picked and tested using PCR with universal primers-M13 
derived from the sequence flanking of the vector . 
1.3 Sequence assembly, alignment and analysis 

Clones for sequencing were selected randomly from the cD- 
NA library, and each clone was incubate at 37?C in 1.5 ml of LB 
broth overnight with shaking . Plasmid DNAs of cDNA clones for 
sequencing were isolated with the standard alkaline lyses protocol 
using the Mini- plasmid kit ( U-gene) . The cDNA inserts were 
subjected to single-pass partial sequencing from the 5 end by 
employing the 5^ end sequencing primer and ABI chemicals on 
ABI 3730 DNA sequencers (Shanghai Bioasia, PRC) . We used 
to MEGA4.0 (Borland, America) to analyze the EST sequence . 
Each EST was firstly analyzed using a multimodule custom pipe- 
line which linked sequence backup, base calling, the elimination 
of sequences shorter than 100 bp (and low-quality sequences ) , 
vector trimming, and sequence assembly. The resulting 
unisequences were compared against the nonredundant (nr) pro- 
tein database at the protein level by using BLASTx with default 
parameters .In general, similarities with E-values < 10^ were 
considered significant . Unisequences displaying no significant 
similarity to known genes ( BLASTx E-values > 10?) were 
searched against the dbESTest-others (non-mouse, non-human) 
using the tBLASTx algorithm . 
1.4 RT-PCR confirmation 

Reverse transcription (RT) was performed based on 2Ug of 
the previously isolated total RNAs of G. officinalis . The cDNAs 
were synthsized accroding to the Reverse Transcriptase M-MLV 
(RNase H  ) Kit (Takara Biotechnology Co ., Ltd) .To validate 
the accuracy of the ESTs, RT-PCR was performed using cDNAs 
as templates and a specific pair of primers designed for each se- 


lected gene . The amplification conditions were 1 cycle for 2 min 


148 0 c ü ER U d 31[] 


at 94°C, and 25 cycles of 94°C for 30 s, 52°C for 1 min, and 
72°C for 1 min 30 s . PCR products were electrophoresed on a 
1.1% agaros@EtBr gel . 


2 Results and Discussions 
2.1 Extraction and purification of total RNA 

The ratio of OD26800D280 of total RNA was 
1.98 . Then, the integrity of the total RNA was ana- 
lyzed by agarose gel electrophoresis ( Fig . 1) . Accord- 
ing to Fig . 1, the bands of 28S and 18S were obvious 
and the brightness of the band of the 28S was about 
twice of the 18S . 





Fig .1 Total RNA of Gentiana officinalis 


2.2 CDNA synthesis 

Using 2Ug RNAs, first-strand cDNAs were syn- 
thesized . Then, K^ of the 1st cDNA were used to syn- 
thesize the ds cDNAs . After 22 thermal cycles, 51 of 
100 ul was analyzed by agarose gel electrophoresis 
(Fig . 2) . The bands of ds cDNAs were dispersed and 
the length of ds cDNA was mainly bounded on 500 - 
5000 bp, indicating the well quantity of the ds cDNA . 
This method was also previously showed to obtain a 
higher percentage of full-length cDNA . 





Fig .2 ds cDNA synthesized using the SMART control reagents 


2.3 The quantity of the cDNA library 

The library titer was measured as 1.2 X 10° pfu 
ml .In order to analyze the size of the constructed li- 
brary and the diversity of cDNA inserts, 21 plaques 
were randomly selected, and amplified with primers- 
M13FR (synthesised by TaKaRa Company, sense se- 
quence: 5 - G TAAAACGACGGCCAGT - 3@ anti- 
sense 5 - AACAGCTATGACCATG - 3 ) following the 
program: 94°C 5 min; 94°C 30s, 52°C 30 s, 72°C 1 
min for 35 cycles; 72°C 10 min . PCR products were 
checked by the DNA markers (Fig .3) . The percent- 
age of recombinants from the library was 95.9% . The 
average length of the inserts was 900 bp . All these 
analyses indicate that RNAs in the cDNA library are 
well represented . 
2.4 General characteristics of ESTs 

A total of 343 cDNA clones were selected randomly 
from the library and single-pass sequences were generat- 
ed . After excluding those poorly sequenced and@&r with 
less than 100 bases, a total of 181 ESTs were obtained . 
Contigs that consist of one sequence were considered 
singletons, while contigs comprised of two or more 


sequences were classified as redundant ESTs or contigs . 





Fig.3 Composition of cDNA fragments in the libraries 
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These EST sequences were clustered in 20 contigs and 
124 singletons . Since the cDNA library has not been 
amplified and the clones for sequencing have not been 
subtracted, the number of ESTs basically reflect the 
prevalence of the corresponding mRNA (Fig . 4) . 
Based on matches with available data (Table 1), 
among 144 unique sequences, 55 showed homology to 
previously identified genes in Gentianaceae or other 
plants, 35 matched other uncharacterized expressed se- 
quence tags (ESTs), and 54 showed no significant 
matches to sequences present in DNA databases . The 
latter two classes of ESTs (together 89 sequences) also 
showed no corresponding protein match . Further analy- 


ses of the ESTs with the matched proteins showed they 
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Fig .4  Prevelence distribution of identified ESTs 
(number of singletons is 124; numbers of contigs with two, three, four, 


five and six sequences are 12, 2, 2, 1, 2, respectively) 


Table 1 Database match of Gentiana officinalis ESTs to the genes of the other organisms 


GenBank 


Score 


. Function . E-Value Closest Species Copies 
accessions (bits) 
Cell division 
AY611040 NAM protein 284 3.00 E-75 Picea glauca 1 
AY639034 auxin-induced putative CP12 domain-containing protein 72.1 1.00E-10 Arachis hypogaea 1 
BK000 123 putative phytosulfokine peptide precursor 95 4.00E-17 Solanum lycopersicum 1 
Chromosome metabolism 
EF520004 sister chromatid cohesion 2 104 6.00 E-20 Arabidopsis thaliana 1 
Defense 
AY 173073 catalase 408 3.00E-128 Hypericum perforatum 1 
DQ444292 metallothionein-like protein 61.6 6.00E-10 Camellia sinensis 2 
DQ497591 Gentiana siphonantha isolate 1 trnS-trnG intergenic spacer 74.8 6.00 E-20 Gentiana siphonantha 1 
EU271754 osmotin 81.7 1.00 E-13 Piper colubrinum 1 
NM. 111462 RCDA (race-cold-inducable 2A) 122 3.00 E-25 Arabidopsis thaliana 1 
NM _ 114701 haloacid dehalogenase-like hydrolase family protein 286 1.00E-74 Arabidopsis thaliana 1 
M embrane transport 
AF003347 ATP phosphoribosyltransferase 162 2.00 E-37 Thlaspi goesingense 1 
AF1 27442 ATAFI -like protein 250 1.00 E-69 Picea abies 1 
AF051222 ATAFI-like protein 297 3.00E-108 Picea mariana 1 
M etabolism 
AB027191 isopentenyl pyrophosphate isomerase 389 8 .00E-106 Gentiana lutea 2 
AB281494 alphá&beta hydrolase fold superfamily 301 2.00 E-92 Gentiana triflora 2 
AF367442 NAD-dependent malate dehydrogenase 144 4.00 E-32 Prunus persica 1 
AJ251269 geraniol 1 0-hydroxylase 143 1.00E-31 Catharanthus roseus 1 
AM269122 putative glycogenin 383 5.00E-104 Picea abies 1 
AM269142 putative homeodomain leucine zipper protein 377 4 .00E-102 Picea abies 1 
AM269256 monooxygenase 325 3.00 E-87 Picea abies 1 
AM 269265 monooxygenase 225 2.00E-56 Picea abies 1 
EU344 848 plastidic aldolase 209 5.00 E-64 Solanum tuberosum 1 
LAUMTNADH NADH dehydrogenase subunit 4 483 9 .00E-134 Lactuca sativa 1 
Phot asynthesis 
AB017366 phytoene cyclase 532 2.00E-154 Gentiana lutea 2 
AB027191 isopentenyl pyrophosphate isomerase 389 8 .00E-106 Gentiana lutea 2 
AB236868 ribulose-1, 5-bisphosphate carboxylas@xygenase small subunit 325 2.00 E-89 Panax ginseng 2 
AF034631 chlorophyll E binding protein LHCII type I precursor 223 8.00 E-58 Panax ginseng 1 
AJ577578 PSII K protein 116 4.00E-44 Olea europaea 1 
DQ781306 chloroplast photosystem II light-inducible protein 80.3 1.00E-19 Pachysandra terminalis 1 
DQ887080 photosystem I psaH protein 232 2.00 E-58 Arachis hypogaea 4 
EF203260 phytoene synthase 3 385 3.00E-104 Gentiana lutea 2 
EU308517 ribulose-1, 5-bisphosphate carboxylas@xygenase large subunit 175 3.00E-41 Silene aegyptiaca 1 
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Continue table 1 
GenBank 


accessions 


X66727 
X95987 
Pratein expression 
AB236868 
AB236868 
AB237912 
AF1 27593 
AF479180 
AF479180 
AJ316582 
AM111313 
AP009123 
AP009374 
DQ176643 
DQ673255 
DQ673255 
DQ629362 
EF207443 
EF207453 
EU118 126 
EU301 782 
EU431 223 
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Function 


P. taeda gene for protochlorophyllide reductase 
PSII polypeptide 


ribulose-1, 5-bisphosphate carboxylas@xygenase small subunit 
ribulose-1, 5-bisphosphate carboxylas@xygenase small subunit 
ribosomal protein $12 

putative 60S ribosomal protein L13a 

26S ribosomal RNA 

26S ribosomal RNA 

ribosomal protein $12 

Plantago major mRNA for histin H3 

Ribosomal protein S12 

ribosomal protein $12 

Vitis pseudoreticulata clone EST-423 23S ribosomal RNA 
ribosomal protein $12 

ribosomal protein $12 

large subunit ribosomal RNA 

ribosomal protein S19 

ribosomal protein L2 

ribosomal protein $12 

Daucus carota 26S ribosomal RNA gene 

ribosomal protein $12 


Signaling components 


AY936336 
Undass fied 
AC139 600 
AC187538 
AK224216 
AK224613 
AK246203 
AK246262 
AK246799 
AK246799 
AK251177 
AL606457 
AM425978 
AM45448 5 
AM462697 
AM482227 
AP004898 
AP008212 
AP008218 
AY142543 
AY142543 
CT831892 
CU223 189 
CU224065 
CU224528 
DQ226906 
EF085796 
EF087806 
EF146998 
EF147066 
EF148586 
EF534108 
NM 001050318 
NM 001050861 
NM. 111205 
NM. 124490 
Y08501 


NdhC 


unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 
unknown 


[] 


Score 

(bits) 

85.8 
184 


325 
332 
153 
412 
189 
245 
135 
293 
259 
319 
132 
116 
116 
348 
250 
347 
182 
192 
349 


261 


244 
329 
309 
137 
159 
280 
230 
229 
242 
105 
241 
183 

48.7 

61.1 
199 
155 
115 
238 
238 
376 
177 
143 

54.7 

79 
199 

69.3 

90.9 

72.5 
142 
105 
113 

91.3 
139 

65.7 

62.5 


E-Value 


8.00 E-43 
4.00 E-49 


2.00 E-89 
2.00E-88 
3.00 E-36 
4 .00E-122 
1.00 E-45 
00 E-62 
00 E-36 
00 E-77 
00 E-74 
00 E-88 
00 E-29 
00 E-36 
00 E-39 
00 E-93 
00 E-64 
00 E-93 
00 E-44 
00 E-47 
. 00 E-93 
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5.00 E-67 


00 E-62 
00 E-87 
00 E-81 
00 E-30 
00 E-36 
00 E-75 
00 E-61 
.00E-61 
. 00 E-62 
. 00 E-49 
.00E-61 
. 00 E-43 
00 E-27 
00E-12 
00 E-54 
00 E-59 
00 E-23 
00 E-66 
00 E-66 
.00E-101 
00 E-49 
00 E-31 
00 E-22 
00 E-12 
00 E-49 
00 E-09 
00 E-36 
00 E-17 
00 E-31 
00 E-22 
00 E-23 
00 E-27 
00 E-39 
00 E-22 
00E-14 
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Closest Species 


Pinus taeda 


Solanum lycopersicum 


Panax ginseng 
Panax ginseng 
Nicotiana sylvestris 
Picea abies 

Exacum affine 
Exacum affine 

Atropa belladonna 
Plantago major 
Gossypium barbadense 
Lepidium virginicum 
Vitis pseudoreticulata 
Jasminum nudiflorum 
Jasminum nudiflorum 
Sabia sp .Qiu 91025 
Cercidiphyllum japonicum 
Peridiscus lucidus 
Ipomoea purpurea 
Daucus carota 


Carica papaya 


Operculina aequisepala 


Medicago truncatula 


Solanum lycopersicum 


Oryza punctata 

Solanum lycopersicum 
Solanum lycopersicum 
Solanum lycopersicum 
Solanum lycopersicum 
Solanum lycopersicum 
Hordeum vulgare 

Orxa sativa Japonica Group 
Vitis vinifera 

Vitis vinifera 

Vitis vinifera 

Vitis vinifera 

Lotus japonicus 

Oryza sativa Japonica Group 
Orxa sativa Japonica Group 
Arabidopsis thaliana 
Arabidopsis thaliana 
Oryza sativa Indica Group 
Populus tremula 

Populus tremula 

Populus tremula 

Boechera divaricarpa 
Picea sitchensis 

Picea sitchensis 

Populus trichocarpa 
Populus trichocarpa 
Populus trichocarpa 

Beta vulgaris 

Orxa sativa Japonica Group 
Orxa sativa Japonica Group 
Arabidopsis thaliana 
Arabidopsis thaliana 
Arabidopsis thaliana 


31 


Copies 
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are involved in protein expression (35946), photosyn- 
thesis (2296) , metabolism (18%), defense (11%), 
membrane transport (596), cell division (5% ), chro- 
mosome metabolism (2%) and signaling components 
(2%) (Fig.5). 
2.5 Verification of the ESTs acauracy using RT-PCR 
To verified the accuracy of ESTs, we designed 12 
pairs of RT-PCR primers ( Table 2) accroding to the ef- 


fective EST sequences selected randomly from Table 1 . 
The 


These primers were used to amplify the cDNAs . 
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expected products were obtained for each pairs of prim- 
ers (Fig . 6). 

In conclusion, we successfully constructed the 
cDNA library of G. officinalis . This library will provide 
a basis for cloning functional genes of this species in 
the future . In addition, ESTs can be further used to 
design a series of primers to amplify nuclear fragments 
of this species and closely related species during study- 
ing population genetics of these species and interspecif- 


ic relationships at genomic level . 


KI E] EJ Bd EJ [D] LJ a 
C Ww RR Ww bh 


= = 


Fig .5 Functional classification of ESTs according to the tBLASTx search results 


(1 .Protein expression 35 46 ; 


2 .Photosynthesis 22% ; 3 . Metabolism 18% ; 4 . Defense 11%; 5 . Membrane transport 5% ; 


6 . Cell division 5% ; 7 . Chromosome Metabolism 2% ; 8 . Signaling components 2% ) 


Table 2 Semi-quantitative RT-PCR primers designed according to the G. officinalis ESTs 


Primers Match Accessions 
G, AB281494 
G, EF520004 
G; DQ?22 6906 
Gy AF051222 
G; AF003347 
Ge AB027191 
G; EF203260 
G; AB017366 
Go AB017366 
Gio AF034631 
Gi EU853017 
Gi? EU344848 


Product length 


632 


1230 


1047 


900 


1163 


755 


692 


975 


1145 


601 


985 


1167 


Sequences of primers 


"^ CGITATGGAAGAGTACACAGCA 3° 
TCACTCCCCTCAAACATAACAC 3 


> nN 
An 


^ CCTCACATTACACCTTGCTACA 3 
GCGAATAAGCAAGCCAAGAC 3 


> nN 
Aun 


"^ AATGGTTCACCGATTCCCCC 3 
CITTACATAAACTCGACAGG 3° 


> nN 
Aun 


" TCGTCAAACAAGTCAATCCG 3° 
GCITICCAGTATCCAGAACCAG 3 


> nN 
UA CA 


"^ CAAATACCATTTTACAAAATTC 3 
GGCACAAGGCAAGAAGGCGA 3 


^ TTTGCTGCATAGAGCGTTTAGC 3° 
ATGTCAGTAGCTTCITGGAGGG 3 


"^ GCGTCACACATAAATCCAACCG 3 
ACTCCTTICTCTGCATCATCATAG 3 


rn Pn pa 
AWW An An 


"^ GTGCGTGCGCTGAGATATTA 3 
CACGCTCCATAGAATCGTCATC 3 


> n 
An 


"^ GTGCGTGCGCIGAGATATTA 3 
CACGCTCCATAGAATCGTCATC 3 


> nN 
Aun 


"^ GICICTICTICCTICTCAGT 3 
TTTATTTGATGGAGTICGAA 3 


> nN 
Aun 


GC TACGCCAGGTATTACCCA 3 
GA TITCCCCGTICTCICTCC 3 


" AATGGTTCACCGATTCCCCC 3 
CITTACATAAACTCGACAGG 3° 


Pan pa 
A AN 
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Fig .6 RT-PCR Results 


(from right to left, are 2000 marker and the products by G, to G;» pairs of primers) 
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